Python Numpy Arrays

In this Notebook, we learn how to build and solve systems of linear equations, and apply these techniques to solve practical problems.

Python is a general-purpose programming, but specialized features geared towards data analysis are available in various libraries, such as numpy. Numpy's main datastructure is the ndarray which supports opperations such as adding rows or columns of data in an element-wise fashion and performing math operations on multidimentionsl matrices. Numpy (and Nnarrays) are the subject of this notebook.

On the one hand, ndarrays are similar to lists in that they consist of a collection of items that can be accessed via indexes, but on the other hand, ndarrays are different from lists by being homogeneous and only containing objects of the same type (list can be heterogeneous).

To work with ndarrays, we need to load the numpy library.


In [1]:
import numpy as np

We can create an ndarray by passing a list to the np.array() function:


In [2]:
list1 = [1, 2, 3, 4, 5]          # Define a list

In [3]:
array1 = np.array(list1)         # Pass the list to np.array()

In [4]:
type(array1)                     # Check the object's type


Out[4]:
numpy.ndarray

In [5]:
print("array1 = ", array1)       # Check the content of the array (printing in Python 3)


('array1 = ', array([1, 2, 3, 4, 5]))

In [6]:
print ("array1 = %s" % np.array_str(array1)) # Check the content of the array (printing in Python 2)


array1 = [1 2 3 4 5]

To create an array with more than one dimension, we can pass a nested list to the np.array() function:


In [47]:
list2 = [[1,2,3,4,5], [6,7,8,9,10]]
array2 = np.array(list2)

print("array2 = ", array2)       # Python 3


('array2 = ', array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]]))

In [48]:
print("array2 = %s" % np.array_str(array2))   # Python 2 and 3


array2 = [[ 1  2  3  4  5]
 [ 6  7  8  9 10]]

The parameters of an ndarray include the number of dimensions it has, the size of each dimension and the type of data it contains. We can check the dimensions of an ndarray with the shape attribute:


In [9]:
array2.shape


Out[9]:
(2, 5)

The output above shows that array2 is a 2-dimensional array with 2 rows and 5 columns. We can check the size (total number of items) of an array with the size attribute and the type of the data it contains with the dtype attribute:


In [10]:
print("array2 has", array2.size ,"items of type", array2.dtype)             # Python 3


('array2 has', 10, 'items of type', dtype('int64'))

In [11]:
print ("array2 has %d items of type %s" % (array2.size, array2.dtype))      # Python 2 and 3


array2 has 10 items of type int64

Numpy has several functions for creating arrays, such as: np.identity(), to create a square 2d array with 1's across the diagonal and 0's everywhere else


In [12]:
np.identity(n = 3)      # n is the size of the square 2-d array


Out[12]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

np.eye() to create a 2d array with 1's across a specified diagonal and 0's everywhere else


In [13]:
np.eye(3,  # Number of rows
       5,  # Number of columns
       1)  # Index of the diagonal (main diagonal, 0, is the default)


Out[13]:
array([[ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.]])

In [14]:
# np.ones() to create an array filled with ones:
np.ones(shape= [2,3])


Out[14]:
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [15]:
# np.zeros() to create an array filled with zeros:
np.zeros(shape= [3,4])


Out[15]:
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Array Indexing and Slicing

Numpy ndarrays offer numbered indexing and slicing syntax that mirrors the syntax for Python lists:


In [16]:
d_array = np.array([1,2,3,4,5,6])

d_array[2] # Get the item at index 2


Out[16]:
3

In [17]:
d_array[4:]       # Get a slice from index 3 to the end


Out[17]:
array([5, 6])

In [18]:
d_array[::-1]     # shortcut to reverse the array


Out[18]:
array([6, 5, 4, 3, 2, 1])

If an ndarray has more than one dimension, separate indexes for each dimension with a comma:


In [19]:
# Create a new two dimensional array
dd_array = np.array([d_array, d_array + 10, d_array + 100])

print(dd_array)


[[  1   2   3   4   5   6]
 [ 11  12  13  14  15  16]
 [101 102 103 104 105 106]]

In [20]:
# Get the element on row 2, and column 3
dd_array[2, 3]


Out[20]:
104

In [21]:
# Slice elements starting at row 1, and column 3
dd_array[1:, 4:]


Out[21]:
array([[ 15,  16],
       [105, 106]])

In [22]:
#Reverse the array in both dimensions (rotation)
dd_array[::-1, ::-1]


Out[22]:
array([[106, 105, 104, 103, 102, 101],
       [ 16,  15,  14,  13,  12,  11],
       [  6,   5,   4,   3,   2,   1]])

Reshaping Arrays

Rnp.reshape() reshapes an array into a new one with the same data but different structure:


In [23]:
np.reshape(dd_array,        # Array to reshape
           newshape=(2,9))  # Dimensions of the new array


Out[23]:
array([[  1,   2,   3,   4,   5,   6,  11,  12,  13],
       [ 14,  15,  16, 101, 102, 103, 104, 105, 106]])

Unravel a multi-dimensional into 1 dimension with np.ravel():


In [24]:
np.ravel(dd_array,          # Array to reshape
         order='C')         # Unravel by rows


Out[24]:
array([  1,   2,   3,   4,   5,   6,  11,  12,  13,  14,  15,  16, 101,
       102, 103, 104, 105, 106])

In [25]:
np.ravel(dd_array,
         order='F')         # Unravel by columns


Out[25]:
array([  1,  11, 101,   2,  12, 102,   3,  13, 103,   4,  14, 104,   5,
        15, 105,   6,  16, 106])

In [26]:
dd_array.flatten()       #flatten a multi-dimensional array into 1 dimension and return a copy of the result


Out[26]:
array([  1,   2,   3,   4,   5,   6,  11,  12,  13,  14,  15,  16, 101,
       102, 103, 104, 105, 106])

In [27]:
dd_array.T  #get the transpose


Out[27]:
array([[  1,  11, 101],
       [  2,  12, 102],
       [  3,  13, 103],
       [  4,  14, 104],
       [  5,  15, 105],
       [  6,  16, 106]])

In [28]:
np.flipud(dd_array) #Flip an array vertically


Out[28]:
array([[101, 102, 103, 104, 105, 106],
       [ 11,  12,  13,  14,  15,  16],
       [  1,   2,   3,   4,   5,   6]])

In [29]:
np.fliplr(dd_array) #Flip an array horizontally


Out[29]:
array([[  6,   5,   4,   3,   2,   1],
       [ 16,  15,  14,  13,  12,  11],
       [106, 105, 104, 103, 102, 101]])

In [30]:
np.rot90(dd_array,    # Rotate the array 90 degrees counter-clockwise 
         k=1)         # Number of 90 degree rotations


Out[30]:
array([[  6,  16, 106],
       [  5,  15, 105],
       [  4,  14, 104],
       [  3,  13, 103],
       [  2,  12, 102],
       [  1,  11, 101]])

In [31]:
np.roll(dd_array,   # Shift elements in an array along a given dimension
        shift = 2,        # Shift elements 2 positions
        axis = 1)         # In each row


Out[31]:
array([[  5,   6,   1,   2,   3,   4],
       [ 15,  16,  11,  12,  13,  14],
       [105, 106, 101, 102, 103, 104]])

In [32]:
np.roll(dd_array,   #Leave the axis argument empty to shift across all dimensions
        shift = 2)


Out[32]:
array([[105, 106,   1,   2,   3,   4],
       [  5,   6,  11,  12,  13,  14],
       [ 15,  16, 101, 102, 103, 104]])

In [33]:
#Join arrays along an axis 

array_to_join = np.array([[10,20,30],[40,50,60],[70,80,90]])

np.concatenate((dd_array,array_to_join),      # Arrays to join
               axis=1)                        # Axis to join upon


Out[33]:
array([[  1,   2,   3,   4,   5,   6,  10,  20,  30],
       [ 11,  12,  13,  14,  15,  16,  40,  50,  60],
       [101, 102, 103, 104, 105, 106,  70,  80,  90]])

Array Math Operations

Numpy arrays can perform mathematical operations easily using math operators like +, -, / and *


In [34]:
dd_array + 10    # Add 10 to each element


Out[34]:
array([[ 11,  12,  13,  14,  15,  16],
       [ 21,  22,  23,  24,  25,  26],
       [111, 112, 113, 114, 115, 116]])

In [35]:
dd_array - 10    # Subtract 10 from each element


Out[35]:
array([[-9, -8, -7, -6, -5, -4],
       [ 1,  2,  3,  4,  5,  6],
       [91, 92, 93, 94, 95, 96]])

In [36]:
dd_array * 2      # Multiply each element by 2


Out[36]:
array([[  2,   4,   6,   8,  10,  12],
       [ 22,  24,  26,  28,  30,  32],
       [202, 204, 206, 208, 210, 212]])

In [37]:
dd_array ** 2      # Square each element


Out[37]:
array([[    1,     4,     9,    16,    25,    36],
       [  121,   144,   169,   196,   225,   256],
       [10201, 10404, 10609, 10816, 11025, 11236]])

One can also use the basic math operators on two arrays with the same shape. The basic math operators function in an element-wise fashion, returning an array with the same shape as the original.


In [38]:
array3 = np.array([[1,2],[3,4]])

array3 + array3


Out[38]:
array([[2, 4],
       [6, 8]])

In [39]:
array3 - array3


Out[39]:
array([[0, 0],
       [0, 0]])

In [40]:
array3 * array3


Out[40]:
array([[ 1,  4],
       [ 9, 16]])

Numpy also provides math functions for ndarrays such as:


In [41]:
np.mean(dd_array) # The mean of all the elements in an array


Out[41]:
40.166666666666664

In [42]:
np.std(dd_array)   #Get the standard deviation all the elements in an array


Out[42]:
45.001543183416373

In [43]:
np.sum(dd_array, 
       axis=1)        # Get the row sums for the elements of an array


Out[43]:
array([ 21,  81, 621])

In [44]:
np.sum(dd_array,
       axis=0)        # Get the column sums


Out[44]:
array([113, 116, 119, 122, 125, 128])

In [45]:
np.sqrt(dd_array) # Take the square root of each element in the array


Out[45]:
array([[  1.        ,   1.41421356,   1.73205081,   2.        ,
          2.23606798,   2.44948974],
       [  3.31662479,   3.46410162,   3.60555128,   3.74165739,
          3.87298335,   4.        ],
       [ 10.04987562,  10.09950494,  10.14889157,  10.19803903,
         10.24695077,  10.29563014]])

T np.dot() returns the dot product of two arrays.


In [46]:
np.dot(dd_array[0,0:],  # Slice row 0
       dd_array[1,0:])  # Slice row 1


Out[46]:
301

The numpay package also includes a variety of more advanced linear algebra functions, such as computing eigenvectors and eigenvalues or inverting matrices.